A Brief Introduction to Git Data Structures
TLDR
- All Git data and version control history are stored in the
.gitdirectory; deleting this directory is equivalent to deleting the local version control. - The
objectsfolder stores three types of objects using SHA-1 hashes: Blobs (file content), Trees (directory structure), and Commits (commit information). - The
refsfolder stores pointers for branches and tags, which are essentially files pointing to specific Commit HASHes. - The
logsfolder records the history of changes to HEAD and branches, which can be queried viagit reflogand used to restore accidentally deleted commits. - The
indexfile is a binary file that records the snapshot of the staging area aftergit add. - The
HEADfile records the branch or commit position of the current working directory. - Branches and tags are merely pointers to commits; Git's history is formed by a chain of pointers (pointing to the previous commit) within the commit objects.
Analysis of the .git Directory Structure
All data stored by Git resides in the .git folder. Below are the purposes and operating mechanisms of the core components within this directory.
Directory Structure
- hooks: Stores various custom scripts that execute automatically at specific moments during Git operations (such as
commit,push,merge), suitable for automated testing or checking code style. - info: Stores auxiliary information. The
excludefile within is used to define local exclusion rules, serving the same purpose as.gitignorebut applicable only to a single developer environment. - logs: Records the update history of references (such as branches and HEAD).
- When you might encounter this issue: When executing
git reset --hardorgit rebase -ileads to the loss of commit records, you can read the records here viagit reflogto perform a restoration.
- When you might encounter this issue: When executing
- objects: Stores all Git data objects (Blob, Tree, Commit).
- Structure: Uses the first two characters of the SHA-1 hash as the directory and the remaining 38 characters as the filename.
- Object generation mechanism: Executing a commit generates three types of objects:
- Blob object: Stores the actual content of the file.
- Tree object: Stores the directory structure and the corresponding Blob object hash values.
- Commit object: Stores submission information (including Tree hash, previous commit hash, author, and message).
- refs: Stores pointers for branches and tags.
heads: Stores local branches.remotes: Stores remote branches.tags: Stores tag names.
Key File Descriptions
- COMMIT_EDITMSG: Records the message of the last commit. This file is opened for editing when executing
git commitorgit commit --amend. - config: Stores Git settings specific to the repository.
- index: A binary file that records the snapshot after the latest commit and the file information added via
git add. - HEAD: Records the currently checked-out branch or commit. If it points to a branch, the content is
ref: refs/heads/branch-name; if in a detached HEAD state, it stores the Commit HASH directly. - ORIG_HEAD: Stores the state of HEAD before destructive operations like
git resetorgit merge, used for restoration. - FETCH_HEAD: Marks the record of each
git fetch, formatted as follows:text{Commit SHA-1} [not-for-merge] branch '{branch name}' of {remote repository URL}- When you might encounter this issue: After
git fetchis executed, if no merge behavior is triggered, the node will be marked as[not-for-merge].
- When you might encounter this issue: After
The Essence of Branches and Version Control
As evidenced by Git's data structure, branches and tags are essentially just pointers to specific commit objects.
- Branch: A pointer that automatically updates its target with every commit.
- Tag: A pointer that points to a fixed commit object.
Git's history is linked together by the "previous commit HASH" stored inside commit objects. When executing git reset or git rebase, although the branch pointer moves, the old commit objects still exist in the objects folder, which is why history can be recovered via git reflog.
Change Log
- 2024-07-31 Initial document creation.
- 2024-09-20 Removed the description regarding
.gitconfigin the root directory as it does not take effect, so it cannot be used for version control.